A text-speech synchronization technique with applications to talking heads

نویسندگان

  • Fabio Vignoli
  • Carlo Braccini
چکیده

In human communication, speech understanding is greatly improvedby the bimodal acoustic-visual effect with respect to simple speech communication, in particular when the communication takes place in noisy environments. In this paper we propose a novel synchronization procedure between text and speech, to reduce the time consumption in the development of friendly audio--visual interfaces or authoring tools for multimedia production. The technique consists of a neural network based processing of speech and a time alignment algorithm. The proposed algorithm is fast and speaker independent since it uses neural networks trained to discriminate among broad phoneme classes and not to recognize speech. This technique has been used to animate the MPEG-4 compliant face model developed at DIST [3].

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Image-based Talking Head: Analysis and Synthesis

In this paper, our image-based talking head system is presented, which includes two parts: analysis and synthesis. In the analysis part, a subject reading a predefined corpus is recorded first. The recorded audio-visual data is analyzed in order to create a database containing a large number of normalized mouth images and their related information. The synthesis part generates natural looking t...

متن کامل

Generation of Personalized MPEG-4 compliant Talking Heads

This paper studies a new method for three-dimensional (3D) facial model adaptation and its integration into a Text-to-Speech (TTS) system. The TTS System pronounces, in real time, English or Greek speech and simultaneously animates the adapted face model, thus simulating a natural talking face. The 3D facial adaptation requires a set of two orthogonal views of the user’s face with a number of f...

متن کامل

A comparison of German talking heads in a smart home environment

The authors describe a newly developed German Text-Toaudiovisual-Speech (TTavS) synthesis system based on the English speaking HeadZero. Targets of the control parameters of the talking head are generated by mapping of German phonemes to the originally English visemic blend shapes controls. The resulting German version of HeadZero and the German talking head MASSY were extended to generate audi...

متن کامل

Face Analysis for the Synthesis of Photo-Realistic Talking Heads

This paper describes techniques for extracting bitmaps of facial parts from videos of a talking person. The goal is to synthesize photo-realistic talking heads of high quality that show picture-perfect appearance and realistic head movements with good lip-sound synchronization. For the synthesis of a talking head, bitmaps of facial parts are combined to form whole heads and then sequences of su...

متن کامل

Multimodal Speech Synthesis

Multimodal Speech Synthesis (’<Talking Heads”) encompasses synthesis of speech from text (“Text-toSpeech”, TTS) plus synthesis of a visual presentation of a face that is lip-synced to the generated audio (“Visual TTS”, VTTS). Talking Heads are now practical because of the ever-increasing computing power and falling prices of computer hardware. This paper highlights recent technological breakthr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999